5 research outputs found

    Algorithm Based Fault Tolerance: A Perspective from Algorithmic and Communication Characteristics of Parallel Algorithms

    Get PDF
    Checkpoint and recovery cost imposed by checkpoint/restart (CP/R) is a crucial performance issue for high-performance computing (HPC) applications. In comparison, Algorithm-Based Fault Tolerance (ABFT) is a promising fault tolerance method with low recovery overhead, but it suffers from the inadequacy of universal applicability, i.e., tied to a specific application or algorithm. Till date, providing fault tolerance for matrix-based algorithms for linear systems has been the research focus of ABFT schemes. As a consequence, it necessitates a comprehensive exploration of ABFT research to widen its scope to other types of parallel algorithms and applications. In this thesis, we go beyond traditional ABFT and focus on other types of parallel applications not covered by traditional ABFT. In that regard, rather than an emphasis on a single application at a time, we consider the algorithmic and communication characteristics of a class of parallel applications to design efficient fault tolerance and recovery strategies for that class of parallel applications. The communication characteristics determine how to distributively replicate the fault recovery data (we call it the {\em critical data}) of a process, and the algorithmic characteristics determine what the application-specific data is to be replicated to minimize fault tolerance and recovery cost. Based on communication characteristics, parallel algorithms can be broadly classified as (i) embarrassingly parallel algorithms, where processes have infrequent or rare interactions, and (ii) communication-intensive parallel algorithms, where processes have significant interactions. In this thesis, through different case studies, we design ABFT for these two categories of algorithms by considering their algorithmic and communication characteristics. Analysis of these parallel algorithms reveals that a process contains sufficient information that can help to rebuild a computational state if any failure occurs during the computation. We define this information as critical data, the minimal application-level data required to be saved (securely) so that a failed process can be fully recovered from a most recent consistent state using this fault recovery data. How the communication dependencies among processes are utilized to replicate fault recovery data is directly related to the system’s fault tolerance performance. We propose ABFT for parallel search algorithms, which belong to the class of embarrassingly parallel algorithms. Parallel search algorithms are the well-known solution techniques for discrete optimization problems (DOP). DOP covers a broad class of (parallel) applications from search problems in AI to computer games, e.g., Chess and various games, traveling salesman problem, various AI search problems. As a case study, we choose the parallel iterative deepening A* (PIDA*) algorithm and integrate application-level fault tolerance with the algorithm by replicating critical data periodically to make it resilient. In the category of communication-intensive algorithms, we choose Dynamic programming (DP) which is a widely used algorithm paradigm for optimization problems. We choose parallel DP algorithm as a case study and propose ABFT for such applications. We present a detailed analysis of the characteristics of parallel DP algorithms and show that the algorithmic features reduce the cardinality of critical data into a single data in case of nn-data dependent task. We demonstrate the idea with two popular DP class of applications: (i) the traveling salesman problem (TSP), and (ii) the longest common subsequence (LCS) problem. Minimal storage and recovery overhead are the prime concern in FT design. On that regard, we demonstrate that further optimization in critical data is possible for particular DP class of problems, where the degree of dependency for a subproblem is small and fixed at each iteration. We discuss it with the 0/1 knapsack problem as a case study and propose an ABFT scheme where, instead of replicating the critical data, we replicate a bit-vector flag in peer process's memory which is later used to rebuild the lost data of a failed process. Theoretical and experimental results demonstrate that our proposed methods perform significantly better than the conventional CP/R in terms of fault tolerance and recovery overheads, and also in storage overhead in the presence of single and multiple simultaneous failures

    An End-to-End Authentication Mechanism for Wireless Body Area Networks

    Full text link
    Wireless Body Area Network (WBAN) ensures high-quality healthcare services by endowing distant and continual monitoring of patients' health conditions. The security and privacy of the sensitive health-related data transmitted through the WBAN should be preserved to maximize its benefits. In this regard, user authentication is one of the primary mechanisms to protect health data that verifies the identities of entities involved in the communication process. Since WBAN carries crucial health data, every entity engaged in the data transfer process must be authenticated. In literature, an end-to-end user authentication mechanism covering each communicating party is absent. Besides, most of the existing user authentication mechanisms are designed assuming that the patient's mobile phone is trusted. In reality, a patient's mobile phone can be stolen or comprised by malware and thus behaves maliciously. Our work addresses these drawbacks and proposes an end-to-end user authentication and session key agreement scheme between sensor nodes and medical experts in a scenario where the patient's mobile phone is semi-trusted. We present a formal security analysis using BAN logic. Besides, we also provide an informal security analysis of the proposed scheme. Both studies indicate that our method is robust against well-known security attacks. In addition, our scheme achieves comparable computation and communication costs concerning the related existing works. The simulation shows that our method preserves satisfactory network performance

    A multi‐device user authentication mechanism for Internet of Things

    No full text
    Abstract The advent of the Internet of Things (IoT) enables different customized services to ease the day‐to‐day life activities of users by utilizing information attained through the internet connectivity of low‐powered sensing devices. Due to device diversity and resource constraints of participating devices, IoT is vulnerable to security attacks. Consequently, authentication is the fundamental measure for using IoT services in the context of network security. IoT devices’ resource captivity makes designing robust and secure authentication mechanisms challenging. Besides, existing user authentication mechanisms are designed assuming a user always accesses an IoT environment using a particular device. However, nowadays, most users employ multiple devices to access the internet; subsequently, it needs an authentication mechanism to handle this diversity. This paper addresses this limitation and proposes a new One‐Time Password (OTP)‐based user authentication scheme supporting user access from multiple devices in an IoT environment. We verify the proposed scheme using widely used BAN logic, AVISPA tool, and informal security analysis, guaranteeing that our scheme preserves the necessary security features. Comparative performance analysis shows that our scheme achieves comparable computation, storage, and communication costs concerning existing works. Moreover, simulation results demonstrate that the proposed method also sustains satisfactory network performance
    corecore